-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additions to agent protocol #717
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🚀
string participant_metadata = 6; | ||
string participant_name = 4; | ||
string participant_identity = 5; | ||
string participant_metadata = 6; | ||
} | ||
|
||
message UpdateJobStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't implemented in the agents client... https://github.com/search?q=org%3Alivekit%20UpdateJobStatus&type=code
since every job will have a participant connection we could maybe remove this and use connection status?
- if a participant is connected the job is ok
- if the client disconnects uncleanly it was in error and we should reassign it
- if the client disconnects cleanly the job is done and completed successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be OK, but we may want to implement Job metadata updates in the future (not a priority for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
job metadata is immutable. the agent can use participant metadata or its own external storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stepping back: what information is conveyed in the participant*
fields? Aren't the participants identified by the Job? If not, what participant is this referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was referring only to UpdateJobStatus
- gh included irrelevant lines. to answer your question though - the agent service sends an availability request to the client. if the client is available to handle the job the response includes participant info used to create a token. the token is then passed back to the client in the job assignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. This PR deprecates the field for now. from the above, it looks like I should rather remove it entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably clean it up in lk first - the service implements it - but yeah, i think so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Next step is to implement this in lk anyway. Probably makes sense to do it before I merge this.
protobufs/livekit_agent.proto
Outdated
message StartAgentJobRequest { | ||
JobType type = 1; | ||
Room room = 2; | ||
optional ParticipantInfo participant = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably only need publisher identity here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for just the room name mb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
protobufs/livekit_agent.proto
Outdated
Room room = 2; | ||
optional ParticipantInfo participant = 3; | ||
string namespace = 4; | ||
map<string, string> metadata = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be a string? (let user serialize stuff if they want)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently use a map for egress, a string for other entities (participant, etc). Either has pros and cons. The map lets us potentially stash fields in there as well using a reserved namespace. The string is more flexible...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in cases where we might be tempted to extend the user supplied metadata with values from the service we should probably add structured fields. for example we might include an optional field for sip data with a struct containing phone #, trunk id, etc...
e6cb753
to
e482780
Compare
7a896ce
to
2901cd8
Compare
4021d70
to
c9cef7e
Compare
804d62e
to
6d19b66
Compare
3138bc2
to
15bbb5c
Compare
protobufs/livekit_models.proto
Outdated
@@ -266,7 +266,8 @@ message SipDTMF { | |||
} | |||
|
|||
message Transcription { | |||
string participant_identity = 2; | |||
// Participant that got its speech transcribed | |||
string transcribed_participant_identity = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems redundant to me? everything in a Transcription should be scoped to the transcription.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a part of #706 probably. It was confusing to see two participant_identities
in data packets in SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that.. would comments in the protobuf help to clear that up? Just looking at a Transcription
object (without context on what is using the message), it seems to break some naming patterns to have transcribed_
again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe source_
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sgtm!
protobufs/livekit_room.proto
Outdated
@@ -59,11 +60,22 @@ service RoomService { | |||
|
|||
// Update room metadata, will cause updates to be broadcasted to everyone in the room, Requires `roomAdmin` | |||
rpc UpdateRoomMetadata (UpdateRoomMetadataRequest) returns (Room); | |||
|
|||
// Create a room configuration. | |||
rpc CreateRoomConfiguration(CreateRoomConfigurationRequest) returns (RoomConfiguration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth discussing: should these be APIs on room service? or Project settings in Cloud (and config in OSS)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if they are on room, we'd have to figure out permissions.. what token permissions would they require
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "configuration service" would work too. I have no strong opinion either way. Does anybody else have any input? Either way, we'll need to sort out permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose not to have a service for this initially. OSS would configure these in the config yaml. Cloud would store it per project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would cloud allow defining these configurations?
6945cd5
to
e5e2337
Compare
5f74d1a
to
7908d00
Compare
22089fc
to
5a410f7
Compare
protobufs/livekit_agent.proto
Outdated
// The load that the job currently has on the worker | ||
float load = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as convenient as it would be to have this does the agent client have any way to give us job load info? isn't most of the work done in a single process? @theomonnom ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Egress runs each worker in a separate process and reports load for each ingress job separately, but that took a not insignificant amount of effort to implement and may be a too high burden to put on all agent implementations. I agree we should delete it unless @theomonnom has some further input on how this would be used?
protobufs/livekit_agent.proto
Outdated
JobStatus status = 7; | ||
string error = 8; | ||
float load = 9; | ||
int64 started_at = 10; | ||
int64 ended_at = 11; | ||
int64 updated_at = 12; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure we want any of these on Job
- this is the job description passed to the worker to initialize a job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mixes Job Model and Execution status in the same message indeed. This is the pattern that is also used by Egress. Most consumers of the Job object need its current or latest state as well, which makes it convenient to have a message that makes it easy to send them together.
The IngressInfo message attempts to separate the Model and State somehow by packaging all the state related entries into a an IngressState message that is in turns included inside the Ingress message. The IngressState entry is null when the Ingress is used purely as model (which only happens at initialization).
An laternative here would be to flip the Job and JobState relationship compared to ingress:
message JobState {
JobStatus status = 1;
string error = 2;
float load = 3;
int64 started_at = 4;
int64 ended_at = 5;
int64 updated_at = 6;
Job job = 7;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes the most sense to me because the job config is an immutable property of the running process but for the sake of consistency we could compromise and copy ingress so they're at least separate objects
f435a70
to
13e271d
Compare
f5a7b25
to
101ca14
Compare
9cfdabe
to
b62ad8d
Compare
protobufs/livekit_agent.proto
Outdated
message StartAgentJobRequest { | ||
JobType type = 1; | ||
string room_name = 2; | ||
optional string participant_identity = 3; | ||
string namespace = 4; | ||
string metadata = 5; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the current implementation publisher jobs apply to all participants once they begin publishing. this doesn't fit into this api. multiple job ids could result from a single call and more jobs could start after the initial call.
i think we've accidentally overloaded the Job
concept - in the original design this represented an invocation of the agent worker. what we're creating with this api call is a rule for dispatching jobs ie start a job for the room or start a job for publishers or for one publisher with the identity x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just StartAgent
/ StartAgentRequest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have 2 options:
- Change the Job definition to represent 1 invocation for either the whole room, or 1 participant
- Keep the current Job definition and remove the
participant_identity
argument in theStart
message (and rename the rpc indeed).
Is there a use case for starting an Agent Job for a single participant? At first glance, it seems so in context where there is a main speaker. Short of doing that, we may eventually need some kind of filter argument to tell a 3rd party agent what participants to process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current (before this PR) Job definition already has an optional ParticipantInfo
field. How are agents currently expected to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll want the option to specify one or more publishers a job should start for. we don't want the user to have to specify the identities of every publisher in the room if they want the job to start automatically for publishers though. i think we need a new concept to describe what this api operates on - i don't have a great name for it but WorkOrder
fits the job/worker metaphor.
the publisher's ParticipantInfo
is stored in this field by the agents framework when dispatching a publisher type job.
f3112ae
to
b62ad8d
Compare
089fcd0
to
592e9e0
Compare
90eb597
to
9903c97
Compare
a404b31
to
420a644
Compare
773bbbe
to
edc26e4
Compare
a0f422c
to
2366dd8
Compare
can we add a field for the number of jobs active during the period the load sample is taken from to the what is the process for associating running jobs with a new worker connection when the websocket reconnects? should we expect a flood of |
c5934eb
to
13fd91a
Compare
Added. We have to define what the active Jon count is though: the count at capture time, the max since last capture, ...?
We have not implemented this anywhere yet AFAICT. I did change the message to use a list of ids though. |
257638b
to
e69197a
Compare
This PR:
metadata
string -> string map to Job to allow agent implementation to take anonymous (to us) parameters